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Abstract: This report presents an algorithm to statically schedule live and strongly connected 
Marked Graphs (MG) . The proposed algorithm computes the best execution where the execution 
rate is maximal and place sizes are minimal. The proposed algorithm provides transition schedules 
represented as binary words. These words are chosen to be balanced. The contributions of this 
paper is the proposed algorithm itself along with the characterization of the best execution of any 
MG. 
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Ordonnancement periodique de graphes marques en 
utilisant les mots binaires balances 



Resume : Ce rapport presents un algorithme pour ordonnancer statiquement un graphe 
marque fortement connexe et vivant. L'algorithme propose calcule la meilleur execution pour 
laquelle le rendement efFectif est maximal et la taille des places est minimale. L'agorithme 
propose fournit les ordonnancements de chacun des noeuds de calcul sous la forme de mots 

binaires. Ccs mots sont choisis balances. Les contributions du rapport sont a la fois I'algorithm 
propose lui-nienie et la caracterisation de la meilleure execution d'un graphe marque. 

Mots-cles : graphe marque, ordonnancement, mot binaire balance 
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1 Introduction 

In the System-on-Chip design domain, the trend is component based design. A new design is 
assembled from IP components which are interconnected through a network of point-to-point 
communication channels. In this area, the problem of long wire communication latency has 
emerged as a limitation |2n|- A channel is not able to forward a datum in a single step but 
requires many. 

To solve this problem, a component based design has to be provided with its scheduling 
to take care of the latency issues. Luca Carloni et al. have proposed the theory of Latency 
Insensitive Design (LID) [14] as a dynamic scheduling solution but LID is greedy in buffering 



RR n° 7891 



4 



Millo & de Simone 



element. From our initial tentative to improve the LID [12^ , we have established that a component 
based design along with its latency issues can be modeled using Marked/Event graph (MG) 
|17| . Consequently, from the challenge of scheduling a System-on-Chip design, we arrive to the 
more general and abstract challenge of scheduling an MG with respect to communication and 
computation latencies. 

To enter this challenge, we have developed the proposed algorithm which provide a statically 
computed execution to any live and strongly connected MG. The proposed algorithm can even- 
tually be applied to any system (software, hardware, production chain) which can be abstracted 
as an MG with fixed communication and computation latencies. 

It is clear from historical results [51 [T31 that a live MG always admits an execution 
irrespectively of the communication or computation latencies. The proposed algorithm consists 
in computing the best ASAP execution where execution rate is maximal and place sizes are 
minimal. These properties match with the requirements encountered in the domain of System- 
on-Chip design [14J. Lastly, the proposed algorithm is extended to simply connected MGs. 
However, the validity of the computed execution relies on the on-demand availability of tokens 
on global inputs. 

Except the proposed algorithm itself, the main contribution of the paper is the characteriza- 
tion of this best ASAP execution. From the initial marking, an guided execution shall lead to 
different markings. From each of these markings, the ASAP execution will be different and token 
accumulation in the places may vary. For example, in a given ASAP execution, a transition may 
fire all its tokens in sequence and then stall for the rest of the period, while in another ASAP 
execution, the same transition is fired every two instants. The first example promotes tokens 
accumulation. Within this set of ASAP executions, the one with the smallest tokens accumula- 
tion is called the balanced ASAP execution. This execution always exists and can be analytically 
computed for any MG. In a balanced ASAP execution, the binary words that represent the 
activities of the transitions through time (1 for activity, for inactivity) are all balanced. 

Related works Marked graphs is a well studied domain for more than forty years and many 
works are closely related to ours. [18] state the notions but also some results used in this paper. 
[T3] and [B] are the bases of our scheduling theory. 

Historically, some works related to the notion of balancedness can be found in a publication 
of Jean Bernoulli in the 18*'' century [7]. Then they appeared as Christoffel words in the 19*'' 
century [TS] . More recently Christoffel words appear again in [15] , and as Sturm words in [5] IH] , 
or as mechanical words in (Sj. [S] records the history of balanced binary words. 

In balanced binary words are used to balance load of Erlang network. In |4j, the authors 

try to minimize the data lose in a graph with fixed storage capacities by optimally routing data 
trough communication channels using balanced binary words. 

Outline Section [2] runs the proposed algorithm on an example. Section [3] presents the MG 
definition followed by all the required results about static analysis of MG. Balanced binary 
words are presented and studied in Section [4] The proposed algorithm is presented in Section [5| 
followed by the proofs of correctness and then Section [6] discusses our results. 

2 Algorithm overview 

This section gives an informal overview of the major steps of the proposed algorithm. The 
vocabulary used is formally defined below. However it mostly refers to the usual and accepted 
definitions of the same in literature. 
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Algorithm inputs and outputs The proposed algorithm inputs are the Hve and strongly 
connected MG and its initial marking (initial token positions). The proposed algorithm outputs 
are the computed execution and the size of the places for this execution. 



Latency expansion and M-equalization In the MG presented in Figure [T]-a, the transition 
(rectangle) on the top has a computation latency of 1. The right-most place (oval) has a commu- 
nication latency of 3. Usually, a token goes through a transition instantaneously and through a 
place in one step. When the computation latency is different from 0, the tokens are kept for some 
time in the transition. Similarly, the tokens are kept longer in a place when its communication 
latency is more than 1. 




a) b) c) 

Figure 1: a) an MG with a computation latency on the top-most transition and a communication 
latency on the right-most place. Its expansion gives the plain MG in b). The MG in c) is the 
IN-equalized version of b) . 



In this representation, tokens evolution during the MG execution is not obvious. For example, 
in a place with a communication latency of 3, some tokens could have been there for 1 instant 
while others have been there for more than 3 instants. The duration of their stay is not explicit. 

To avoid this problem, the vertices with latencies are expanded in sequences of plain vertices 
such that the "semantics of the latency" remains. A place with a communication latency n is 
replaced by n successive places while a transition with a computation latency m is replaced by 
m + 1 transitions interleaved with m places. Thanks to this transformation, the exact location 
of tokens is known. Figure ^b is the expansion of Figure ^a. 

In every ASAP executions reachable from the initial marking (after guided initialization), 
token accumulation mostly occurs in the same places. In the MG in Figure [T]-b, token accumu- 
lation occurs in the left-most place. When the accumulation is such that every token is kept at 
least 2 instants in the place, the behavior of the place is similar to one with a communication 
latency of 2. Thus it can be expanded. The K- equalization [T^] detects these places analytically 
and increases their latencies accordingly. In Figure [l]-c, the MG is the IN-equalized version of the 
one presented in Figure [l]-b. 
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Running the proposed algorithm on an example The proposed algorithm is defined for 
an IN-equahzed MG where the latencies has been expanded. These steps are considered to be 
the preliminary steps of the proposed algorithm. 
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Figure 2: The binary words associated to the transitions express the balanced ASAP execution 
of the MG introduced in Figure [T]-c. 



Even in a IN-equalized MG, token accumulation occurs. In some of the ASAP executions 
(reached after guided initialization), the accumulation is very limited while in others, many tokens 
can be regrouped in the same place. The balanced ASAP execution {ExeCperiodic) has the lowest 
accumulation. Figure [2] presents the schedule of every transitions according to ExeCperiodic (0 
means inactivity, 1 means activity). The first main step of the proposed algorithm computes 
ExeCperiodic analytically. 

In Figure [2] the marking from which ExeCperiodic occurs is called Mperiodic- It is different 
from the initial marking (Mq) (Figure [i]-c) . The second main step of the proposed algorithm 

computes Mperiodic- 




Figure 3: From AIq on the left to Mperiodic on the right following the initial guided execution. 
The third main step of the proposed algorithm consists in finding the guided initialization 
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(Execinitiai) leading to Mperiodic from Mq. In Figure [sj The 2-bits-length schedules attached to 
each transition is Execinuiai- 

As one can see in Figure [4] the computed execution is ExeCmiUai followed by the infinite rep- 
etition (bj) of ExeCperiodic- It guarantees a maximal execution rate and a minimal accumulation 
of tokens. The proposed algorithm guarantees that place sizes are either 1 or 2. In the running 
example, every place size is 1. 
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Figure 4: The MG in A/q and the execution computed by the proposed algorithm. 



3 Marked graph 

This section presents the Marked Graph (MG) model also known as Event Graph along with 
classical definitions and results that will be used in the sequel. Our contributions in this section 



are the notion of delay presented in Section 3.5 and Theorem 24 



An MG is a graph where vertices can have two types: transitions and places. A place can 
stock tokens. The edges of a MG are called arcs. They cannot connect two vertices of the same 
type. A source is a transition without incoming arc. A sink is a transition without outgoing arc. 

Definition 1 (Marked Graph). A marked graph is a structure G = {T,P,F,Mq) where 

• T is a set of transitions. 

• P is a set of places. 

• F C_ (T X P) U {P X T) is a set of arcs. If t ^ T and p P, (t,p) and (p, t) are two arcs 
resp. from t to p and from p to t. 

• AI : P ^ M is a marking. Mq is its initial marking. 

• Each place has exactly one incoming and one outgoing arcs: Vp G P, \{{t^p) \ G T}| = 

|{(p,t)|Vier}| = i. 

The constraint on the number of place inputs and outputs guarantees that a token can be 
used by only one transition. Consequently, the MG is said conflict free or deterministic. Figure 
[l]-b presents an MG with 7 transitions (rectangles) and 8 places (ovals). 5 of these places contain 
one token (black dots). 
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Notation 2 (Predecessor, successor). Let G he an MG, t £T and p € P. We note : 

• 't is the preset oft, *t = {p\ {p,t) e F}. 

• t* is the postset t, t* = {p\ {t,p) G F}. 

• *p is the transition which precedes p, *p = t such that {t, p) G F. 

• p* is the transition which succeeds p, p* =t such that {p, t) G F. 

Definition 3 (Throughput of an MG, critical element). Let G be an MG and p be a place of 

G. A cycle c is a path from p to p. It is called elementary if all the transitions of the cycle 
are different. The marking of c is M(c) = Sp£cM(p) and the latency of c, denoted L{c), is the 
number of place on c. The value M{c)/L{c) is the throughput of c. The cycle(s) with the lowest 
throughput is (are) said critical and the throughput of the MG is the one of the critical cycle(s). 
The transitions, arcs and places are said critical if they belong to a critical cycle. 

An MG is closed if it has neither source nor sink and it is connected if there exists a path, in 
the underlying undirected graph, relating any pair of vertices. It is strongly connected if there 
exists a path, in the MG itself, relating any pair of vertices. A strongly connected component 
(SCC) of an MG is a subgraph that is strongly connected (a subgraph of an MG is an MG 
composed of a subset of T, a subset of P, and a subset of F); it is said critical (CSCC) if all its 
elements arc critical. A direct acyclic component (DAC) is a subgraph that does not contain 
any cycle. In general, a connected MG is composed of DACs relating SCCs together. A strongly 
connected MG is ever closed. 

3.1 Semantics of execution of an MG 

We define an execution semantics of an MG based on a logical time with a synchronous semantics. 
At the instant 0, the MG is in its initial marking. Then, an execution step leads to another 
marking at instant 1 and so on. During a single execution step, many Arable transitions can be 
fired simultaneously (synchronously) but a single transition can be fired only once. 

Definition 4 (Firable transition at a marking M in an MG). In an MG G, a transition t E T is 
firable at a marking M if\/p G *t, M{p) > Q. A source is always firable. Fm is the set of firable 
transitions at a marking M. 

Definition 5 (MG execution model). Let G be an MG and M its current marking. An execution 

FT 

step is a transition relation from M to M' denoted M — > M' with FT C Fm, Vp e P, 
M'{p) = M{p) + FT{*p) - FT{p*). (FT{t) = 1 if and only iftG FT. FT{t) = otherwise). 
An execution (Exec) of an MG is a finite or infinite sequence of execution steps: Exec = 

Mo ^ Ml ^ Ma ^ ... ^ M, ... where FTi C Fm,_i . 

Notation 6 (Concatenation of execution). Let G be an MG. Let Exccq be a finite execution of 
G from the marking Mq to the marking Mi and Execi be a finite or infinite execution of G from 
the marking Mi . 

ExecQ.Execi is the execution of G formed by Exccq followed by Execi. 

Notation 7 (ASAP and guided executions). Let G be an MG. An execution of G is said As 
Soon As Possible (ASAP) if and only ifii, FTi = FMi_i (all firable transitions are ever fired). 
An execution of G is said guided if and only if FTi C i^Mj_i- In o, guided execution, one has to 
decide which firable transitions are fired at every step. 
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Definition 8 (Scheduling and schedule). Let G be an MG with an execution Exec. Let t E T 
be a transition of G. The schedule oft is the binary word relating the activity oft: Sched{t) — 
FTi{t).FT2(t) ■ ■ ■ FTi(t) ■■ ■ . The scheduling of G for an execution Exec is the mapping t — > 
Sched{t) \ yteT. 

Remark 9 (Scheduling and execution). The successive steps of an execution can be deducted 
from its scheduling. Consequently, a scheduling defines an execution and vice versa. 

As we have seen in Section [2j the proposed algorithm computes an ASAP execution by 
computing the schedule of every transition. 

3.2 Classical results 

Definition 10 (Liveness). An MG is live if there exists an execution where every transition is 
fired infinitely often. 

In jT7], the authors show that the number of tokens on a cycle remains constant through 
execution. They deduce an MG is live iff all its cycles contain at least one token. 

Definition 11 (Mutually reachable marking). Let G be a strongly connected MG, M and M' 
two markings of G. M and M' are mutually reachable if there exists an execution sequence from 
M to M' and another from M' to M. 

In [TS], the authors prove that two live markings, M and A/', of the same strongly connected 
MG, G, are mutually reachable (through a guided execution) if for every cycle c of G, M(c) — 



initial part followed by a steady part. The steady part is not reachable from the initial marking 
through an ASAP execution. Thus the initial part is a finite guided execution from the initial 
marking to the first marking of the steady part. This operation is possible because the two 
markings are mutually reachable. 

3.3 Execution rate 

In fT31, the authors prove that the ASAP execution of a live and strongly connected MG is 
ultimately repetitive following an execution pattern. Equation ^ shows the evolution of the 
marking of a live and strongly connected MG. Afg is the initial marking and the arrows are 
ASAP execution steps. 



The period of the pattern is p and the number of firings of every transition within a period is 
k (the periodicity). We say that, the execution of the MG is k-periodic with a period p. In other 
words, the execution rate is k/p. In |6j, the authors give a formula to calculate the exact value 
of the periodicity (k) and the period (p) of the ASAP execution of a closed MG. According to 
this formula, the execution rate (k/p) equals the value of the throughput given in Definition |3] 
Thus the throughput is the maximal execution rate of the MG since it is the one of the ASAP 
executions. This is the one guarantied by the proposed algorithm. 

Proposition 12 (Maximal execution rate of an MG). The maximal reachable execution rate is 
achieved by the ASAP execution of the MG. 



M' 





(1) 
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Remark 13 (Execution rate in an open MG). The result on the maximal execution rate is valid 
for strongly connected MGs (and thus closed). In a simply connected open MG, the execution 
rate depends upon the execution rate of the source(s) hut the maximal execution rate is hounded 
hy the worst throughput of its SCCs and can he calculated using the same formula given in 161. 
Consequently, if the source(s) fire(s) on demand, the MG can he considered closed. 

In Section [53] the proposed algorithm is extended to simply connected MG. In such a case, 
the proposed algorithm returns the schedules of the sources and sinks. The schedule of a source 
says when a token has to be generated by the source in order to feed the next transition and 
ensure the overall consistency of the execution. 



3.4 Size of places and boundedness 

As we have seen in Section [2] the proposed algorithm computes an execution which implies a 
minimal size of places. Let us now define this notion. 

Definition 14 (Size of places). Let G he an MG, Exec an execution and p a place of G. The 
size of p on Exec denoted CExecip) is the highest marking of p during the entire execution. 

From the initial marking of a strongly connected MG, a guided execution can lead to any of 
the reachable markings. From each of these markings, there exists a bounded ASAP execution. 
The size of the places for these executions may vary. An execution has a minimal size of places 
if every places has a minimal size compared to the other ASAP executions. 

Definition 15 (Minimal size of places). Let G he an MG and Mq its initial marking. Let M he 
the set of markings reachahle from Mq. Let IE he the set of ASAP executions from the markings 
ofM. 

Exec £ E has a minimal size of places iffy Exec' G IE, Vp G G, we have CExedp) l£ CExec'ip)- 

As we have seen in Section [2] the proposed algorithm computes an ASAP execution where 
place sizes are minimal. The extension of the proposed algorithm to simply connected MGs is 



discussed in Section 5.3 but it suffers from limitations since the execution of a simply connected 



MG may not be bounded. 

Definition 16 (Boundedness). An execution is hounded if the size of every place is hounded. An 
MG is hounded if every execution is hounded. 

Whereas any SCC is bounded, the sizes of the places of a DAG are not. Let us assume an 
MG composed of two SGGs connected through a DAG. If the throughput of the upper SGG is 
superior to the throughput of the underneath SGG, the ASAP execution of the MG will lead to 
an infinite accumulation of tokens in the DAG. On contrary. If the throughput of the upper SGG 
is inferior to the throughput of the underneath SGG, the upper SGG will limit the execution rate 
of the underneath SGG and the behavior of the MG during an ASAP execution will be ultimately 
repetitive. 

Proposition 17 (ASAP execution of a connected MG). The ASAP execution of a connected 
MG may he unhounded. 

In every bounded execution of a connected MG, all SGGs have the same execution rate. 
Gonsequently, the highest reachable execution rate is equals to the worst throughput among the 
SGGs. An execution at this rate will be ASAP for the SGGs with the worst throughput. The 
execution will also be ASAP for the underneath SGGs. However, the upper SGGs will be slow 
down to avoid accumulation and thus will not have an ASAP behavior. 
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Proposition 18 (Bounded execution of a connected MG). The bounded execution of a connected 
MG may not be ASAP. 

The propositions [17] and [18] explain why the proposed algorithm is restricted to strongly 
connected MG. However, this restriction can be abolished as discussed in section [53) A simply 
connected MG can be transformed in a strongly connected one (by relating SCCs together) so 
that the proposed algorithm is applicable. 



3.5 Delay 

During the execution of an MG, at a given instant, if a token reaches a place p and is not 
consumed by p* at the next instant, then the token is said delayed. This can happen in two 
cases: 1) when p* is not fired; all the tokens in p are delayed. 2) When p' is fired and p contains 
many tokens; all tokens in p excepted the used one are delayed. Globally, delays can be seen 
as a way for the MG to synchronize its branches together. A non critical cycle leans to take 
advance over critical cycles (it is faster) but eventually, the execution rate is the same for every 
one. So the delays reduce the execution rate of fast cycles to the execution rate of the slowest 
dynamically. The value A/c2 * Lci — Mci * Lc2 represents the number of delay required during 
a period of execution to synchronize the cycle ci with the cycle C2. 

Definition 19 (Delay). Let G be an MG and p ^ P be a place of G. Let Exec be an execution 
ofG. Following the notation of Definition^ Delay{p,i) is the number of delays occurring in p 
at the i*'* step of Exec. Delay{p,i) = Mi^i{p) — FTi{p*). 

After the initial part, the sum of delays in the places of a cycle during a period of execution 
reflects the difference of rate between a critical cycle and the current cycle. 

Theorem 20 (Delay in a cycle during a period of execution). Let G be an MG and c a cycle 
of G. Let Exec be an [ultimately! ^-periodic execution of G with a period p. Let jo be an upper 
bound of the length of the initialization. 

j:p(zcT,''^^iDelay{p, jo + «) = M{c) * p - L{c) * k 

Proof. In c, at each instant, M(c) tokens are present. This means M(c) * p transitions could be 
fired over a period of execution. However, in a period of execution, every transition of G are fired 
k times. This means L(c) * k transitions are effectively fired on c during a period of execution. 
The difference between the amount of possible fired transition and the amount of effective fired 
transition is the number of delays in c over a period. □ 

The spatial distribution of delays is the exact location where the delays occur during a period 
of execution. 

Definition 21 (Spatial distribution of delays). Let G be an MG. Let Exec be an [ultimately] 
k-periodic execution of G with a period p. D : P ^ ¥i is called a spatial distribution of delays if 
Vc, the cycles of G, Tip^cD{p) — ^^(c) * p — L{c) * k. 

Exec is said to be based on D if after the initial part, the delays in exec during a period of 
execution occur as expressed in D. 

In the specific case of an ASAP execution, the delays occur as late as possible in the MG. This 
makes the corresponding spatial distribution of the delays unique for a given strongly connected 



MG. This spatial distribution is called the "latest delays position". The theorems 23 and 24 
prove these claims. 
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Definition 22 (Latest delays position). Let G he a strongly connected MG. Let D he a spatial 
distrihution of the delays in G. D is the latest delays position if for all transition t of G, there 
exists at least one place p in *t such that D(jj) — 0. 

Theorem 23 (Existence of the latest delay position). Let G he a strongly connected MG with a 
throughput inferior or equal to 1. The latest delay position ever exists for G. 

Proof. A spatial distribution of delays D can be deducted from a period of the ASAP execution 
of G. Vp G P, D{p) = T,'^^-^^Delay{p, jo + i) where jo is the length of the initial part. 

Either D is the latest delay position or there exists at least a transition t for which every 
places in the preset of t has at least n delays (with n > 1). In the second case, n delays can 
be removed to every place in the preset of t and added to every place in the postset of t. This 
transformation gives another (valid) spatial distribution of delays for which t has at least one 
place in its preset without delay. 

The iteration of this transformation reaches a fix point because no delay appends on the 
critical cycle. The fix point is the latest delay position. One should note the similarity of this 
argument to the liveness condition. □ 

Theorem 24 (Latest delay position and ASAP execution). Let G he a strongly connected MG 
with an execution Exec. Exec is hased on the spatial distrihution of delays D. i) If D is the 
latest delays position, then Exec is ASAP, ii) Let Exec' he another ASAP execution of G from 
another initial marking Mq. Exec' is hased on the spatial distrihution of delays D' . If Mq and 
Mq are mutually reachahle, then D = D' . 

Proof, i) If Vt e T, 3p e {'t} such that D{p) = 0, as soon as M{p) > 0, t fires. This is a ASAP 
execution. 

ii) If Mq and Mq are mutually reachable, the number of tokens per cycle is the same in Mq and 
Mq for every cycle of G [18 . Consequently, the number of delays per cycle is the same in D and 
D'. 

Now let us assume there exists a place p such that D(j)) =/= D'{p). Let pathi and path2 be 
two paths in the graphs, pathi goes from a transition of a critical cycle to *p and path2 goes 
from p* to a transition of a critical cycle. We assume without lost of generality that the number 
of delays on pathi is the same according to D and D' . pathi followed by p followed by path2 
followed by a section of a critical cycle forms a cycle for which the number of delays is the same 
according to D and D' . Since D{p) ^ D'{p), the number of delays on path2 is different on D 
and D'. 

Since D and D' are the latest delays position, there exists a path pathQ from a transition of 
a critical cycle to p* which do not contains any delay (the construction of this path can be done 
by backtracking from p: while reaching a transition, the input place without delay is selected, a 
critical cycle will ultimately be reached). But pathQ followed by path2 followed by a section of 
the same critical cycle forms a cycle where the number of delays is different according to D and 
D' . Mq and AIq are not mutually reachable. □ 

As we have seen in Section [2] the proposed algorithm computes an ASAP execution. This 
ASAP execution is based on the latest delays position of G. In some sense, the proposed algorithm 
proves that there ever exists an ASAP execution based on the latest delay position. 

3.6 Latencies 

The preliminary step of the proposed algorithm is the expansion of the vertices with latency in 
plain vertices. 
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Definition 25 (MG with communication/ computation latencies). 

Let G be an MG. A marked graph with latency G' is a tuple {G, L^onn L^ai) : 

• The mapping Lcom-P ^ ^\{^} gives the communication latencies of places. 

• The mapping Lcai^T — >■ IN gives the computation latency of transitions (cal stands for 
calculation) . 

A place with a communication latency of n keeps every token at least n instants. A transition 
with a computation latency of m keeps every token exactly m instants. According to Definition|5j 
the latency of a transition in a plain MG is and the latency of a place is 1. The tokens go through 
transitions instantaneously but stay at least one instant in a place. The transformation from an 
MG with latencies to an MG without latency has been introduced by Chander Ramchandani in 
|22| . This transformation preserves the semantics of a latency. 

Figure ^a presents an MG with computation latencies on the top transition and communica- 
tion latencies on the right-most place. Figure [l]-b is the expansion of Figure [l]-a. The top-most 
transition is replaced by two transitions with a place in between which represents the compu- 
tation latency. The right-most place is replaced by three places interlaced by two transitions. 
Each of the three places represents a communication latency. 

Liveness, closedness, (strongly) connection, throughput, execution rate, number of cycles, 
and number of tokens per cycle remain constant through the latency expansion process. 

3.7 IN-equalization 

In an MG where a cycle c is largly faster that the critical cycle, any ASAP execution will lead 
to a situation where a place of c will keep every token at least two instants. In consequence, the 
behavior of this place is exactly the same as two places in sequence with a dummy transition 
in-between. The M-equalization performs this transformation wherever it is required. The MG 
in Figure [l]-c is the M-equalized version of the MG in Figure [T]-b. 

The resulting IN-equalized MG has the same behavior as the original one but the throughput 
of c has changed. It has been reduced to approach the critical one but cannot become less. It 
may append that some non-critical cycles can become critical and the value of k and p can change 
but the ratio k/p remains constant. The major expected change is that for every places in the 
resulting MG, the number of delays over a period becomes bounded by k. More details about 
IN-equalization is available in [11| [12] . 

Definition 26 (IN-equalized MG). An MG G is said M-equalized if and only if every transition 
belonging to a strongly connected component of G belongs to a cycle c such that: 

M{c)/{L{c) + 1) < throughput{G) < throughput{c) 

Lemma 27 (Delay in a IN-equalized MG). Let G be a fi-equalized MG. Let p be a place of G. 
Let D be a spatial repartition of delays. < D(p) < k holds (where k is the periodicity of G). 

Proof. For all places p in G, there is a cycle c such that I]p^cD{p) = A'I{c) * p — L{c) * k. 

Moreover, if G is IN-equalized, 
M(c)/(L(c) + 1) < throughput(G) < throughput{c) 
<^ M{c)/{L{c) + 1) < k/p < M{c)/L{c). The two inequations hold: 

• M(c)/(L(c) + 1) < k/p ^ M(c) * p < (L(c) + 1) * k ^ Af(c) * p - i(c) * k < k. 

• k/p < M{c)/L{c) ^ M{c) * p - k * L(c) > 0. 
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The two inequations can be merged in < M{c) * p — i(c) * k < k < Spgc-C'(p) < k. Even 
if all the delays of the cycle are merged in one place, D{p) < k. □ 

The major complexity of the M-equalization comes from the interleaving of cycles in the 
MG. The addition of an extra place on a path may increase the latency of many cycles and 
some of them can become slower that a critical cycle while some others still require extra places. 
Consequently, all the cycles have to be considered simultaneously to find the correct location 
of the additional places. In |lll I12| . integer linear programming is used to specify all the K- 
equalization constraints. A more elegant solution can be built based on (max, plus) algebra [6J 
by considering the incidence matrix of the MG and its evolution over a period. 

In Figure [T] the M-equalization may appear trivial because many places belong to only one 
cycle. The left cycle in Figure [T]-b is faster than the right cycle, so an extra place can be added 
after the leftmost place. The critical (right) cycle has a throughput of 4/7. The left cycle has a 
throughput 2/3. The inequation 2/(3 + 1) < 4/7 < 2/3 holds. 




Figure 5: This MG is already IN-equalized. 



Figure |5] presents a non-trivial example of M-equalization. The outer cycle is critical with 
a throughput 2/9. There is three cycles with a throughput 1/4. Since 1/(4 + 1) < 2/9 < 1/4, 



the IN-equalization condition of Definition 26 hold. The inner cycle has a throughput 1/3 so it 
seems that an extra place could be added to equalize it (1/(3 + 1) > 2/9) but every place of 
this cycle also belongs to another cycle with a throughput 1/4. Consequently, the MG is already 
W-equalizated. 

As we have seen in Section [2] the M-equalization of an MG is the preliminary step of the 
proposed algorithm. 



4 Balanced binary words 

This section presents the basic definitions and well-known results on balanced binary words (f3^). 
Up to our knowledge. Theorem |44] that presents the relation between the operation of rotation 
and transposition, is original. The goal of this section is to present all these results in a way that 
eases the comprehension of the proposed algorithm. 
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4.1 Finite and infinite binary words 

As usual the set of binary values is noted B = {0,1}, B* the set of finite binary words, B" 
the set of binary words of length n, B+ the set of non-empty finite binary words, B" the set of 
infinite binary words, and e the empty word. We note B°° = B* UB"^, the set of finite or infinite 
binary words. 

For u e B°°, we note \u\ the length of u (with |u| = oo whenever u e B'^). Similarly we note 
and |7i|o the number of occurrences of letters 1 and in u respectively. Also, for u £ we 
note slope{u) the ratio B^ = {u | u e BP and = k}. For i < \u\ we note u{i) the z*'' 

letter of u. 

The lexicographic ordering on words is defined as: for u,v £ B°°, u < v iS3i E fi, \/j < i, 
u{j) = v{j) and either u{i) = and v{i) = 1 or |u| = i — 1 and \v\ > i. This order is total. For 
any finite subset V of B°°, inf{V) and sup{V) are respectively its lowest and highest elements 
for this ordering. Finally, for u G B* and v £ B°°, u is a factor of v if 3ui G B*, U2 £ B°° such 
that V — UI.U.U2. 

Definition 28 (Ultimately k-periodic binary word). An infinite binary word is called ultimately 
k-periodic if it is of the form u.v^ , with m S B* and v £ B+ with \v\i = k > 0. 

It is called simply k-periodic if in addition u = e. It is called ultimately periodic if k = 1. It is 
called only periodic if both conditions occur. For an ultimately k-periodic word, u is called the 
initialpavt, v, the steady part, k ~ \v\i is the periodicity, and p = \v\ is the period. By definition 
slope{u.v") — slope{v). P is the set of ultimately periodic infinite binary words and is the 
set of such word of periodicity k and period p. 

Example 29. 11.(0110101)" is J^-periodic with period 7, and so is in Pj. 

Because the ASAP execution of an MG is ultimately periodic, the proposed algorithm mainly 
focus on a single period of execution that aim to be indefinitely repeated. Thus, the following 
results concern finite binary words. In the proposed algorithm, for each transition t of an MG, 
the appropriate words vt and ut are found and the ultimately k-periodic word ut.{vt)^ is built 
to represent the schedule of t. 



4.2 Rotation and transposition 

As we have seen in Section [2] the proposed algorithm computes the schedule of every transition 
of the MG. To do so, the schedule of a transition is deducted from the schedule of one of its 
predecessors {{'p \ p £ 't}) using the transposition and rotation. In Section 4.6 we illustrate 
the link between the rotation and the effect of a latency on a schedule as well as the link between 
the transposition and the effect of a delay on a schedule. 

Definition 30 (Unitary forward rotation). The unitary forward rotation is defined as p: B* — > 
B*, p{e) = e, and Vu G B*, V6 G B, p{u.h) = b.u. 

Definition 31 (Rotation). Let u £ B^. we note p^^{u) the n successive unitary forward rotation 
of u. p^{u) — u, p^{u) — p(u), p"^{u) — p"^^ o p[u) and, 

p^^^{u) = V when u — p"(w). The parameter n is called the spin of the rotation. 



Example 32. ^^(iioioiO) = 0101101, ^"^(1101010) = 1010110 and, pP{u) = p°{u) = 



u 



Definition 33 (Orbit). Let w G B*, the set of all rotations of u is called the orbit of u and is 
noted 0{u). 
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Example 34. Foru = 0110101, 0{u) = {u,p^{u), ...,p^[u)} = {0110101, 1011010, 0101101, 1010110,0101011,101 

Definition 35 (Transposition). Let u, v (z 13°°. v is called the unitary forward transpose of u (or 
simply transpose for short) and noted v — t(w. A), iff 3ui G B* and 3u2 G B°°, u = U1.I.O.U2, 
V = U1.O.I.U2, and A = + 1. A is called the location of the transposition. By definition, if 
u = O.wi.l, r(u, — l.wi.O where u is finite. 

Example 36. t(1010101,3) = 1001101, t(1101010,3) is not defined, t(011,3) = 110, t((10101)", 3) = 
10011.(10101)'^ and, (t(10101, 3))" = (10011)". 

4.3 Balanced binary words 

The proposed algorithm computes an execution where all schedules are ultimately k-periodic 
balanced binary words with a period p. 

Definition 37 (Balanced binary word). A finite binary word u S is said balanced if^v^t, 
two factors of such that \v\ = \t\, the following property holds: —1 < \v\i — \t\i < 1. 

The set of finite balanced binary words with length p and containing k occurrences of 1 is 
denoted by S^. Also, u € SS^ is said primitive when k and p are mutually prime. By extension 
an ultimately periodic word is called balanced if its steady part is. We have chosen the letter S 
for Smooth. 

In |i9j, the authors prove that i) in a balanced binary word u, the number of 1 in every factor 
of u'^ with a length I is either [/ * |m|i/|u|J or \l * , ii) all the balanced binary words with 

the same slope are equivalent by rotation (let u,v £ S^, 0{u) — 0{v) — S^), iii) m/(S^) = O.u.l 
and sitp(S^) = I.m.O {u G BP~^), and lastly iv) whenever k and p are not mutually prime, 
every balanced binary word in (called in this case non-primitive) is the repetition of a smaller 
primitive balanced binary word: let < k < p and GCD(k, p) — x.^u £ S^, 3v e S^^j^j such that 
u = w^. 

When the proposed algorithm meets none-primitive balanced binary word, it considers the 
primitive balanced binary word imprinted into it. The execution is correct because when u — , 
we have u'^ = = u". 



4.4 Transposition on balanced binary words 

Definition |40] defines a bijective function of transposition from to S^. It requires some inter- 
mediate results. 

Lemma 38 (Transposition in S^). Vm G with k and p relatively prime. There exists a unique 
A such that t(u. A) G S^. 

Proof. If the transposition is applied to any 1 of in/(S^), the transpose is a lower word which 
is consequently not balanced except for the last bit of m/(SS^), in this case, the transpose is 
swp(S^). This result is consistent modulo rotation. □ 



If k and p are not relatively prime, GCD{k, p) = x. Vu G S^, u = . We define A = A' such 

k/x 



that A' is the unique location where t{v, A') G S^y^ 



Lemma 38 shows that A is the last position of m/(S^). Starting from this location, A can 
be found in every word of . 

Corollary 39. In p"(m/(S^)), A = p + n = n mod p. 



Inria 



Periodic scheduling of marked graphs using balanced binary words 



17 



We define the transposition function as the transposition applied on the bit A of a balanced 
binary word. 

Definition 40 (The transposition function on balanced binary words). 

We define the transposition function applied on balanced binary words as: r"; SS^ — > S^. r"(u) = 
u, t{u) = T^{u) = t{u, A) where A is the same as in Lemma 
if and only if t"'(v) = u. If V. and p are not relatively prime, GCD{k, p) = x. Vm € S^, u = . 
r"(u) = (r"(w))^. 

Example 41. tI(IIOIOIO) = 1011010, t2(1010101) = 0101101, tP{w) = w, and r(llOllO) = 
101101. 

Lemma 42. The function r" is bijective. 

Proof. Since r^(p"(m/(SS^))) = /9"(sitp(SS^)), there is a one to one correspondence between the 
elements and the images through the t function. □ 

4.5 Equivalence between rotation and transposition on balanced bi- 
nary words 

Theorem [44] presents our original result on balanced binary word. It states that for any given 
balanced binary word u, the transpose of u is equivalent to the rotation of u with a spin —a. 
Let us first define a. 

Definition 43 (The alpha coefficient). Let k, p be two relatively prime integers, < k < p. a 
is the inverse of — k mod p. 5*0 we have — k * a = 1 mod p and a relatively prime with p. 

Theorem 44. Vit E S^, p""(t(u)) = u. 

Proof. We are going to prove that u — U1.O.I.M2 and p^iu) — U1.I.O.U2 (ui,U2 G B*). This 
means that u is the transpose of p"{u). So we compare u and p°'{u) bit-wise for i G |1,p1- 
p°'{u){i) = u{i — a) = [{i — a) * k/pj — [{i — 1 — a) * k/pj . 
a in u{i — a) is replaced by its value and the equation is simplified in: 
= L^J - L^^^^J. Otherwise, u{^) = [f\ [^J- 
For i * k 7^ k — 1 and i * k 7^ p — 1 modulo p, u{i — a) ~ u{i) and 
for i * k = p — 1 modulo p, u{i — a) = 1, u{i) — 0, moreover, 
(i + l)*k = p — l + k = k + l modulo p, and u{i + 1 — a) — 0, u{i + 1) = 1 

□ 

The proposed algorithm computes the schedules of the transitions from the schedules of its 
parent transitions. These schedules are equivalent by rotation because they are all balanced. 
Thanks to Theorem [44] the rotation is used instead of transposition in the schedule computa- 
tion formulas. This simplification lightens the formulas and allows correctness checking of the 
proposed algorithm. 

4.6 Prom word to schedule 

The unitary forward rotation represents the effect of a latency on a transition schedule while the 
unitary forward transposition represents the effect of a delay. Figure [6] focuses on two transitions 
of a 4-periodic MG with a period 7. The schedules of A and B are binary words with length 
7 containing 4 bits with the value 1. In Figure [6]-a, the schedule of B is the unitary rotation 
of the schedule of A because no delay is affected to the place in-between. The arrows illustrate 



38 



and ■ 
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this rotation {B{i + 1) = A{i), Vi mod 7). In Figure |6]-b, two delays are affected to the place 
in-between. B does not compute all the tokens generated by A as soon as they are available any 
more. Two of them are delayed. The schedule of B is the double transposition of the rotation 
of the schedule of A. The first arrow in diagonal illustrates the rotation, the two next, the 
transpositions. In Figure [6]-c, thanks to Theorem |44[ the succession of operations presented in 
Figure[6]-b is replaced by the equivalent rotation of value: 1 — 2*a where 1 is the original rotation, 
—2 * a represents the two transpositions. For (k, p) = (4, 7), we have a = 5 (Definition [43]) . So 
the spin of the rotation is 5 (1 — 2*q;=1 — 2*5 = —9 = 5 mod 7). 




0101101 
Sched(B)=pi(Sched(A)) 

a) 



D(P)=2| 



0110101 

y 

1011010 

V 

1010110 

^- 

1010101 

Sched(B)=T^(pi(Sched(A))) 
b) 



D(P)=2 



lOllOlOl 



1 1010101 

Sched(B)=p'(Sched(A)) 
c) 



Figure 6: a) The schedule of B is the unitary rotation of the schedule of A. b) The schedule of 
B is the double transposition of the rotation of the schedule of A because the place in between 



is a 2-delays place, c) Thanks to Theorem 44 the schedule of B in b) is the rotation of spin 5 of 
the schedule of A. 



5 Balanced scheduling of MG 

This section details the proposed algorithm that computes an execution which is characterized 
by the following properties: i) the execution rate is maximal, ii) place sizes are minimal, and iii) 
after a guided initialization, the execution is ASAP. 

Input: the proposed algorithm, presented in Algorithm 1, takes as input a live and strongly 
connected MG with a throughput inferior or equals to 1. Section [53] discusses the application of 
the proposed algorithm on a simply connected MG. 

Output: Algorithm 1 returns the computed execution along with the size of the places 
required for this execution. 

The following notation are used in Algorithm 1: 

• G is the MG in input and A/q is its initial marking. 



D is the latest delays position (Definition 22 1. 

Execinitiai IS the initial guided execution of G from its initial marking to Mperiodic- 
Mperiodic IS the marking of G from which ExeCperiodic starts. 
ExeCperiodic IS an balanced ASAP execution of G from the marking Mperiodic- 
Sched{t) is the schedule of the transition t in ExeCperiodic- 
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• The execution Exec — Execiniuai-Execperiodic is the output of the proposed algorithm. 



• Cexec gives place sizes according to Exec (Definition 14 1. 

We consider that the preliminary step of the proposed algorithm is the IN-equalization of the 
MG followed by the expansion of its latencies. W-equalization is discussed in Section |3.7| and 
expansion of latencies is discussed in Section |3.6[ 



Algorithm 1 The proposed algorithm 
Input : G with its initial marking Mq- 
Output : The execution Exec and the place sizes Cex< 

1. (k, p) ^ compute _k _p[G) 

2. D -It- compute_D{G) 

3. ExeCp^^riodic 

4- compute _ExeCperiodic{G, D, k, p) 

4. Aperiodic 

compute _Mpe„odic{G,ExeCperiodic, p) 

5. ExeCinitial ^ compute _ExeCinitial{G , Mo T M periodic) 

6. Exec ExCCinitial-ExCCpQfiodic 

7. Gexcc ^ compute _C Exec{G , D, k, p). 
return {Exec,CExec) 



5.1 Algorithm details 
5.1.1 Step 1: compute k and p 

The formula is given in [6 . k = GGD{Mo{c)) and p = GGD{L{c)), for all cycle c of the 
CSCCs. Step 1 requires the enumeration of all the elementary cycles. This enumeration has an 
exponential complexity with respect to the number of transitions. It binds the overall complexity 
of the proposed algorithm. 



5.1.2 Step 2: compute the latest delays position D 



D has to be the latest delays position (Definition 22 1 in order to build the ASAP execution 
ExeCperiodic- Theorem 23 shows that the latest delays position can be deduced from any ASAP 
execution of G. Thus, Step 2 computes D from the ASAP execution of G. Step 2 has a polynomial 
complexity according to the number of transitions. Algorithm 2 details Step 2. 

Figure [7] presents D on the running example. The right-most cycle, ci, is critical, it does 
not contain any delay. The left-most cycle, C2, is not. The difference of firing over a period is 
|ci| * |c2|i — |ci|i * |c2| = 7 * 2 — 4 * 3 = 2. The places of C2 that do not belong to ci should share 
2 delays. The left-most and top-most place contains all these delays because in the latest delays 
position, the delays have to occur as late as possible. 



5.1.3 Steps 3: compute ExeCperiodic 

Step 3 affects a schedule to every transition with respect to D. Algorithm 3 details Step 3. It 
has a linear complexity according to the number of transitions. 

In Figure [s] Step 3 generates a balanced binary word 1101010 G S4 because the MG is 4 
periodic with a period 7. Step 3 affects this word to a transition and it computes the schedule of 
the other transitions using the rotation. The schedule of the 2-inputs transition (1010101) can 
be found from its right predecessor p^(OlOlOll) or from its left predecessor p^(OllOlOl). The 
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Algorithm 2 compute _D 
Input : G. 
Output : D. 

Run the ASAP execution of G. 
for all p e P do 

D{p) = Yf-_^Delay{p, jo + i) where jo is the length of the initial part, 
end for 

while D is not the latest delay position do 
for all t e T do 

f orwarded _delay = min(D{p) \ Vp G *t) 
for all p e *t do 

D{p)— — f orwarded _del ay 
end for 

for all p e t* do 

D{p)-\- — f orwarded _del ay 
end for 
end for 
end while 
return D 



Algorithm 3 compute _ExeCperiodic 
Input : G, D, k, and p. 

Output : ExeCperiodic- 

Let t e T, Sched{t) ^ get_a_word_in{Sl^p {with r = GCD{k, p).} 
current _transition -f— t 

while 3t' E T such that Sched{t') is not defined do 
for all t' £ {(current transition*)*} do 

Sched{t') -s— p^^^^ * ^*°^(Sched{current_transition)) 
end for 

current _transition t' 
end while 
return ExeCperiodic 



spin of this last rotation is5 = l — 2*a mod p. The place in-between the transitions contains 
2 delays. Since a — 5, 1 — 2*q: = 1 — 2*5 = —9 = 5 mod 7 . 

The consistency of this method is guaranteed because the number of delay for each cycle is 
conformed to Theorem [20l The lemma l45l formalizes this result. 

Lemma 45 (Creation of ExeCperiodic)- Step 3 is consistent. 

Proof. Let m e be a balanced binary word. The number of delays occurring on a cycle c 
during a period of execution is n = Mq{c) * p — L(c) * k. The latency on this same cycle is L{c). 

If we impose the schedule of a transition i on c to Sched{t) = u and we propagate this 
schedule to the successors according to Step 3, then t will be ultimatly reached again. The 



updated schedule of the t will be We know from Definition 43 that a * k = — 1 

mod p so if we focus on the quantity L[c) ^ n * a: 

L{c) — n* a — L{c) — a * (Mo(c) * p — L{c) * k) = L(c) — a* Mo{c) * p + a * L{c) * k 

= L{c) — a* A/o(c) * p — L{c) mod p = —a * Mo(c) * p mod p = mod p, it is equivalent to 

modulo p. 
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Figure 7: D: The amount of delay is written within the place. 



10101101 




1011010 



Figure 8: From the schedule's seed, Step 3 generates all other schedules through rotation. The 
schedule of the 2-inputs transition can be found from both its predecessor. 



RR n° 7891 



22 



Millo & de Simone 



Consequently, the schedule of t remains the same, the method is consistent. □ 



5.1.4 Step 4: compute Mp 



eriodic 



Step 4 deduces Mperiodic from Execperiodic- Mperiodic IS not Only the marking from which 
ExeCperiodic runs but also the marking generated by ExeCperiodic after a period of execution. 
Consequently, the last step of a period reaches Mperiodic- The last bit of Sched{*p) represents 
the activity of *p at the last instant of the period. If it has been active, it has produced a token 
in p. Algorithm 4 details Step 4. It has a linear complexity according to the number of places. 



Algorithm 4 compute _Mj 



periodic 



Input : G, ExeCperiodic, and p. 

Output : Mperiodic- 

pe P, SchedCp) = and Sched{p') = v'^ 
for all p E P do 

Mperiodicip) U(p) + [piu) < v] 

{[p{u) < w] = 1 if p{u) < V and otherwise.} 
end for 

return Mp^riodic 



[p{u) < v] ~ 1 means that one token is being delayed in the place at the current instant. 
[p{u) < v] is always equal to when D{p) ~ because v = p{u). When D{p) > 0, v is the 
transpose of p{u). In the usual case, p{u) > v because transposition shifts Is to the right. But 
when the transposition occurs on the last bit of the word, the transpose gets a bit on its first 
position and becomes higher than the original word. Thus, if a transposition occurs on the last 
bit, it means that a token is currently delayed in the place. Lemma [46| formalizes this intuition. 

Lemma 46 (Presence of tokens in delayed places). Letp be a place of G such that D{p) = n > 0. 
Let u = Sched{'p) and v = Sched(p'). If p{u) < v, p is delaying a token in the marking 

Aperiodic • 

Proof. V — — t"{p{u)). By definition, the transpose of a word is lower than the 

original word except when the last bit is transposed. In this last case, the transpose is higher 
that the original word. If, v > p{u) (but v = T'^{p{u))), at least one of the transpositions occurs 
on the last bit. The interpretation of this statement is that the firing of p* was supposed to 
occur at the last instant of the period but has been delayed to the next one. The token related 
to this execution is currently in p. □ 

Figure [9] illustrates Step 4. The last bit of the schedule of a transition determines whether a 
token is present in its output place (s). The place with delays contains a regular token because 
the schedule of the predecessor finishes by 1 but it does not contain an extra token because 
1010101 < p(OllOlOl) = 1011010. 



The correctness of Step 4 is presented in Section [5.2[ First, Lemma 51 proves that the marking 
Mperiodic IS reachable from Mq. Then, Theorem |56| shows that the ASAP execution of G from 

^^periodic ^S ExeCperiodic- 



5.1.5 Step 5: compute Exec 



■initial 



Algorithm 5 computes ExeCmiUai based on integer linear programming solving. The optimization 
criterion is the minimization of the number of firing because one cannot express linearly the 
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Figure 9: The step 4 generates Mperiodic from Execperiodic- The presence of an additional token 
in the delayed place is found using the function [v > p{u)]. 



minimization of the number of steps required to run Execmiuai- The mapping Fmit associates 
to each transition the number of firing required to reach Mperiodic- The function build _execution 
builds Execinitiai by simulating an ASAP execution of G where each transition t cannot be fired 
more than Finit{t). The complexity of Step 5 depends upon the algorithm used to solve the 
linear system of inequation. Lemma |47] shows the correctness of Step 5. 



Algorithm 5 compute _ExeCi,, 



dtial 



Input : G, Mo, and Mperiodic- 
Output : Execinitiai- 
Cst = {Cst is the set of linear constraints} 
for alH e T do 

Cst+ = {F,,,u{t) > 0} 

for all p £ t' do 

Cst+ = {Finiti'p) = F^nitip') + Mperiodicip) ^ Mo{p)} 

end for 
end for 

Fimt ^ lp_solve{Cst, Min{TjyteTFinit{t))) 
Execmitiai ^ build _execution{Finit) 
return ExeCiniUai 



In Figure [3] Mq is on the left. The 2-bits- length schedules attached to each transition is 
ExeCiniUai leading to Mperiodic on the right. 

Lemma 47 (Correctness of Step 5). Algorithm 5 computes a valid execution ExeCiniUai reaching 
Proof. Let us call Mi the marking at the end oi ExeCmitiai- Vp £ P, Mi{p) — Mo{p) — Finit{p*) + 

Finiti'p) = Mo{p) - F^nitip') + Finitip') + Mperiodic{p) ~ Mo{p) = Mperiodicip) ■ □ 

According to |18) , the maximum number of firings between two markings (Mq and Mperiodic 
in our case) is in O(n^) where n is the number of transitions in the MG. We assume that the 
length of ExeCinitiai is convenient because: i) the bound O(n^) is given in terms of number of 
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firings. ExeCmiUai allows parallel firing of transitions, ii) the periodic execution ExeCperiodic 
covers a set of p markings. The initial part can reach any of these marking. So the problem is 
equivalent to: reaching the closest marking of ExeCperiodic instead of only Mperiodic- iii) the cases 
where the upper bound is reached are extreme cases where all tokens have to shift to another 
place far from the initial one or because the shift of one token implies the shift of all others. In 
Mperiodic, the tokcns are "spread equally" in the MG. Mperiodic might be the easiest reachable 
marking. 



5.1.6 Step 6: compute Exec 

Exec is composed of ExeCiniUai followed by ExeCperiodic- After the guided initialization, the 
execution is ASAP and repetitive. In Figure |4] the MG is in its initial marking. The execution, 
Exec, is represented by the ultimately k-periodic schedules attached to each transition. 



5.1.7 Step 7: compute Cexcc 

If a place does not contain delay, every token reaching the place leaves it at the next instant. As 
long as a place contains at most one token in Mq, its size is 1. Lemma [48| demonstrates that if 
a place contains delays, tokens are never delayed more that one consecutive instant because the 
MG is ]N-equalized and the schedules are balanced. In consequence a place cannot accumulate 
more than two tokens. 

Lemma 48 (Delayed place size is bounded by 2). According to Exec, place size where delays 
occur is hounded by 2. 

Proof. First, G is M-equalized, so the number of delay per place is bounded by k. Secondly, since 
the execution is balanced, a token can be delayed only once in a row. Lastly, since the execution 
is k-periodic, there is (at most) k different tokens to delay. These conditions guarantee that a 
token cannot stay more than 2 instants in the place. Consequently, no accumulation of more 
than 2 tokens can occur. □ 

Even for delayed places, a size of two is required only if a token is delayed while another 



reaches the place. Theorem 49 shows that a delayed place has a size of one when D{p) < p — k 
because delays occur first on the 1 which are followed by a 0. In Figure [7] all the places have a 
size of 1. In the delayed place p, D{p) = 2 < 7 — 4. 

Theorem 49 (Exact delayed place size). Let p he a place, 

CExecip) ^ 1 ^ D{p) < p - k 

Proof. First, if a place p with Dijj) — n has a size one, every other place p' with D{p') < n also 
has a size one. If a place p with D{p) = m has a size two, every other place p' with D{p') > m 
also has a size two. This property is guaranteed by the Lemma [38] In two different delayed 
places, the delayed tokens are the same modulo rotation. So the problem of calibrating the size 
of a place only depends upon the amount of delays in that place and not at all about the location 
of these delays. 

Let u = Sched{'p) and v = Sched{p'). A place p requires a size two when a token is used 
after the next one has reached the place. Formally, there exists n such that [?;]„ > [u]„ (where 
[u]n is the position of the n*'' 1 in u). v says when the current token is used, u says when a new 
token reaches p. 

Let us assume that D{p) = p-k. We have v = = pi-p*a+k*a^ = u so [u]„ > [m]„ 

never holds. 
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Let us assume that D{p) = p - k + 1. We have v = pi-{p-k+i)*»^ ^ pi-p*a+k*a-a^ ^ ^^^-j 
so [r(w)]„ > [u]n holds when n is the index of the delayed token. □ 

The following theorem proves that the proposed algorithm computes an execution which has 
a minimal size of the places as claimed earlier. 

Theorem 50 (Minimal size of the places). Cexcc gives the minimal size o J places. 

Proof. When D{p) < p — k, CExec{p) = 1 so it is minimal. 

Let us now assume an ASAP execution Exec' from the marking M' reachable from Mq. Let 
assume a place p' such that = p — k + 1. At most p — k tokens within a period can be 

delayed while no token follows. It remains at least 1 token that has to be delayed but that is 
followed by another token. In this last configuration, p contains two tokens and thus the size of 
p is at least 2. Consequently, CExec{p) is also minimal when D{p') > p — k. □ 

5.2 Correctness of the step 4 

Let us first prove the reachability of Mperiodic from Mq then we prove that the ASAP execution 

from ^^periodic is ExCCp^fiodiC' 

5.2.1 Reachability of Mperiodic from Mq 

Lemma 51 (Reachability of Mpi^^iodic from Mq). Mp^riodio o,s computed in the step 4, is reach- 
able from Mq . 

Proof. According to fTB; , both markings are mutually reachable if and only if for each cycle of 
the MG, the two markings have the same number of tokens. Now, let us prove that Mperiodic 
and Mq respect this condition. 

First, Lemma |53] considers that all the delays of a cycle are assembled in the same place and 
proves that the condition holds. Lemma [53] requires the Lemma [52] Then Lemma [54] generalizes 
Lemma [53] to any allocation of delays in a cycle. □ 

If all the delays are assembled in the same place p, Mperiodidc) is equals to the number of Is 
in the suffix of length L(c) of Sched{p') because the schedules are, in such a case, elementary 
rotations of the previous ones and the bit of index p says whether a token is there in the output 



place. We have seen in Section 4.3 that the number of Is in a factor of a balanced binary word 



of length L(c) is either [L(c) * |u|i/|u|J or \L{c) * Lemma 52 proves that if the suffix 



of length L(c) has \_L{c) * |w|i/|m|J Is, p is currently delaying a token. Otherwise, p is not. 



Consequently, the number of tokens in c is always \L{c) * |m|i/|u|] . Lemma 53 concludes that if 
the MG is equalized, Mq{c) = \L{c) * |it|i/|u|] also. 

Lemma 52 (Suffixes and lexicographic order in S^). Let u G S[ and j, Z e IN such that < j < I 
and k>j*p — k*Z>0. We note n = j*p— k*l. 

There exists n balanced binary words v € 0{u) such that \suf fix{v, l)\i = [Z*k/pJ (suf fix{v, I) 
is the suffix of v of length I). Moreover, these n words are the highest according to the lexico- 
graphic order. 

Proof. Consider the word uK By definition slope{v}) = slope{u) = k/p. can be sliced in p 
factors of length I. Each factor is different from the others and matches with a suffix of length I 
of w G 0{u). If the number of factors containing \ l * k/pj Is is different from n, slope{v}) cannot 
be k/p. 
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Moreover, if |sM//«a;(i;, ^)|i = [Z*k/pJ, \prefix{v,<p — l)\i = k— [Z*k/pJ. So if |sw//ia;(u, = 
\l * k/p], |pre/ia;(u, p — Oli — 1^ ^ * I^/pI- ^ word with more Is in its prefix is higher than 
another with less Is according to the lexicographic order. □ 

Lemma 53 (Reachability of Mperiodic from Mq in the simple case). Let c be a cycle of G such 
that all the delays occurring in c are assembled in the place p. We have Mp^j-iodicic) = Mq{c). 

Proof. Let us call u the schedule of p*. The number of token in c is Mperiodicic) — S^g^ ^^(p — 
i) + [u> 

^il'o ^u{p — i) = \suf fix{u, L(c)\i. Since u is balanced, [L(c) * k/pj < \suf fix(u, L{c)\i < 
\L{c) *k/p]. 

Case 1: if \suf fix{u, L{c))\i = lL{c) * k/pJ, u is one of the D{p) highest word of 0{u) 



(Lemma 52 ). Consequently, u > p-^'-P^*"u because a rotation of a increases the value of the word 
according to the lexicographic order but if the highest is reached, another rotation of a gives the 
lowest. So [u > p^'^P)*"?/] = 1 and Mpgriodidc) — [L{c) * k/pJ + 1 = \L{c) * k/p] (In the case 
D{p) 7^ 0, p does not divide k * L{c)). 

Case 2: if \suf fix{u, L{c)\i — \L{c) * k/p], u is not one of the D{p) highest word of 0{u) 



(Lemma 52). Consequently, [u > p-^(?'^*"u] = 0, and Mperiodicic) = \L{c) * k/p] also. 

Conclusion: since G is IN-equalized, Mo{c)/L{c) > k/p > Mo{c)/{L{c) + 1). So (k * L(c) + 
k)/p > Mo{c) > k* l/p. By definition of the W-equalization, the solution always exists and is 
unique: [i(c)*k/p]. □ 

In Lemma [53] a delay can occurs only in one place but in Lemma [54] every place can contain 
delays and they might be delaying a token in Mperiodic- In this Lemma, we give the formula to 
compute Mperiodic from a place pq that we are going to consider as the first place of the cycle, 
then we prove that if a delay is shifted to the last place of the cycle, the number of tokens in the 
cycle will be the same. Thanks to this result, we can shift all the delays into the last place and 
conclude that the number of tokens found in Lemma [53] is applicable to the general case. The 
inertia of the shift operation on the number of tokens is proven by considering the last places 
of the cycle such that the first and the last of this sequence of places contain delays but none 
of the other in-between does. In such a case, the effect of the shift operation on the formula to 
compute Mperiodic Can be analyzed locally. 

Lemma 54 (Reachability of Mpi,riodic from Mq in the general case). For all cycle c, Mp^riodidc) = 
Afo(c). 

Proof. Let c be a cycle of G. The places of c are {po,pi, ■■■,Pl(c}-i}- We note u the schedule of 
the transition *pq. 

Mp,„oMc{c) = I]fio^"'(u(p- (i - iD{po) + ... + D{pi)) * a))+ 

ypi+l-(D{pa) + ...+D(p,+i))*a^ > pi + l-{D(pa) + ...+D(p,))*a^-^ ^ 

Let io be such that Vi G]io7 L{c) — 1], D{pi) = and let us focus on the few last terms of this 
sum such that < i < L{c) — 1 (In the worst case, = L{c) — 2 and only the last term of the 
sum is there). The following equality is going to be proved for these terms only: 

\^pio-{D(po) + ...+D{p,^))*a^ ^ ^io-(D(po) + ... + -D(p.o-i))*"u] (A) 

+Efi^)-^(p - (z - {D{p,) + ... + D{p,)) * a)) (B) 

+ [U > pDiPHa}-i)*au] (C) 

^pia-iD{po) + ...+D(p,g-l))*a^ ^io-(D(po) + ...+n(p.o-i)-l)*ayj j^^'-j 
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-{^- {D{po) + ... + D{p,) - 1) * a)) (B') 

+ [U > p(DiPHa)-i) + l)*o^u] (C). 

There is only three cases to study to prove this property: 

• When (A) is equals to 1 but (A') is equals to 0, then the first term of (B) is equals to 
and the first term of (B') is equals to 1. If the first place delays a token (A)=l but not any 
more after the shift (A')=0, the token has been computed instead of being delayed and 
then it appears in the next place (B')=l. All the other term of the sum are the same. 

• When (C) is equals to but (C) is equals to 1, the last term of (B) is equals to 1 and the 
last term of (B') is equals to 0. If the last place does not delay any token (C)=0 but does 
after the shift (C')=l, this token was in the last but one place (B)=l and is now in the 
last one (B')=0. All the other term of the sum are the same. 

• In every other possible cases, (A) equals (A'), (B) equals (B'), (C) equals (C). 

Thanks to this property, we know that the number of tokens in c is the same wherever are 
the delays in the cycle. So the result found in lemma [53] is applicable to the general case. □ 



5.2.2 Validity of ExeCperiodic from Mpe„odic 

Lemma 55 (A step of execution from Mpi^riodic)- Let Mi be the marking resulting from a step 
of ASAP execution from Mpgriodic, M[ is the marking resulting from a step of ExeCperiodic from 

Aperiodic • 

Then, Ah = M[ 

Proof. In an ASAP execution, a transition t executes if and only if all the incoming places 
contains a token. In Mperiodic, the place *t contains a token if and only if Sched{**t){p) = 1 
or [p^{Sched{**t)) < Sched{t)]. In the first step of ExeCperiodic, a transition t executes if and 
only if Sched{t){l) = 1 ^ p-\Sched{t)){p) = 1 <^ Sched{"t){p) ^ 1 or that [p^{SchedC*t)) < 
Sched(t)]. The condition of execution are the same. If the same transitions are fired according 
to an ASAP execution or ExeCperiodic, then the resulting markings are the same. □ 

Theorem 56 (Validity of ExeCperiodic)- The ASAP execution of G from the marking Mperiodic 

IS ExeCperiodic- 



Proof. Step 3 is based on the affectation of a schedule by a random balanced binary word from 
and Lemma 



S^. The lemmas 



45 



51 



55 



also hold for any other balanced binary word from S 



Since all the words of SS^ are equivalent by rotation. Step 4 gives all the successive markings 
of ExeCperiodic wlicu the Step 3 is initiated with, successively, all the words of S^. For each of 



these marking. Lemma 55 proves that the next marking is reachable through ASAP execution. 



Consequently, from Mperiodic, and after p steps of execution, ExeCperiodic reaches Mperiodic- n 



5.3 Extension to the simply connected case 

As we have seen in proposition [18] one cannot guaranty that an ASAP and bounded execution 
exists for a given simply connected MG. Since a System-on-Chip cannot be designed with un- 
bounded memories, the extension of the proposed algorithm to simply connected case preserves 
the bounded property at the expense of the ASAP property. The maximum execution rate is 
still preserved but the minimality of the size of places is altered. 

A simply connected MG can be transformed into a strongly connected one by adding feedback 
paths. Thus, the proposed algorithm can be applied. To do so, we add to the MG some feedback 
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paths which bind all the components together. The functional behavior of the system will be 
preserved but its scheduling will be over-constrained by the added feedback paths i.e. adding 
different feedback paths imply a different execution computed by the proposed algorithm. These 
feedback paths act as synchronization barriers. 

There is different algorithmic solution to realize the transformation; however, the added 
feedback paths should not create a cycle with a throughput inferior to the critical one in the 
original MG. Otherwise, the maximal execution rate will not be achieved. It is easy to prove 
that the marking and the latency of the added feedback paths can always be adjusted so that 
the created cycles have a non-critical throughput. 

The minimality of the size of the places is guaranteed for the original SCCs, but the size of 
the places on the original DAC depends upon the added feedback paths. One may find another 
set of feedback paths such that the size of places on the original DAC is less. We have not yet 
studied this optimization. 

Open MG If a simply connected MG is open, one can consider that the system has global 
input (s) and output (s). In order to schedule the MG, it is transformed in a strongly connected 
one. Consequently, the MG becomes closed. The run of the proposed algorithm shall return 
a schedule for every source and sink. The schedule of a sink says when the system produces 
an output token and the schedule of a source says when the system consumes an input token. 
Thus, the concerned input token has to be present when required. In pij], we state that the 
execution rates of the feeder and eater have to be the same in order to calibrate the capacity of 
the "interconnection" place with a finite value and thus ensure on-demand token availability. In 
|16| . the authors study thoroughly the sizing of buffer between clocked systems. 

The AES example Figure [TO] presents an implementation of the AES encryption standard. 
The MG has been represented using K-Passa (K-Periodic Asap Static Schedule Analyser) [21|. 
K-Passa implements the proposed algorithm but also the IN^-equalization. The circles represent 
the transitions of the system. The arrows represent the sequences (arc — > place arc) in- 
between two transitions. The two left most transitions called key and word are sources (the local 
loop has been added for simulation purpose). The central transition called output word is a sink. 
The schedule attached to each transition is the one computed by the proposed algorithm. The 
guided initialization has a length 1, then the behavior is 1-periodic with a period 6. Every place 
has a size one. The only place where one delay occurs is the one between word and mux (where 
a small square appears), however a size one is enough. 

As one can see, the AES example is a simply connected graph. In order to run the proposed 
algorithm, two paths from the sink to each of the sources have been added to the system. 

6 Results and discussion 

This paper proposes an algorithm to statically schedule any live and strongly connected MG 
with a throughput inferior or equals to one. The proposed algorithm computes the balanced 
ASAP execution where the execution rate is maximal and place sizes are minimal. Moreover, 
a transformation has been proposed to change a simply connected MG in a strongly connected 
MG such that the proposed algorithm can be applied. 

In the domain to the System-on-Chip design, the proposed algorithm is used to schedule 
applications which are subject to the problem of long wire latency. If we compare our approach 
to the latency insensitive design, this last is not as strict as our approach about the constraint 
on availability of data on global inputs. It is a purely dynamic solution but the cost for this 
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Figure 10: The MG presents an implementation of the AES encryption standard. 



dynamicity is the duplication of every data path in the circuit and the replacement of every 
simple register by a two-sized-register to manage the dynamic communication and computation 
protocol. This difference makes our approach better for pure data flow system. 
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